Pesquisa | Portal Regional da BVS

SnakeLines: integrated set of computational pipelines for sequencing reads.

Budis, Jaroslav; Krampl, Werner; Kucharík, Marcel; Hekel, Rastislav; Goga, Adrián; Sitarcík, Jozef; Lichvár, Michal; Smol'ak, Dávid; Böhmer, Miroslav; Baláz, Andrej; Duris, Frantisek; Gazdarica, Juraj; Soltys, Katarína; Turna, Ján; Radvánszky, Ján; Szemes, Tomás.

J Integr Bioinform ; 20(3)2023 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-37602733

RESUMO

With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.

Assuntos

Genômica , Software , Reprodutibilidade dos Testes , Genômica/métodos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos

WarpSTR: determining tandem repeat lengths using raw nanopore signals.

Sitarcík, Jozef; Vinar, Tomás; Brejová, Brona; Krampl, Werner; Budis, Jaroslav; Radvánszky, Ján; Lucká, Mária.

Bioinformatics ; 39(6)2023 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-37326967

RESUMO

MOTIVATION: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required. RESULTS: Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique. AVAILABILITY AND IMPLEMENTATION: WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr.

Assuntos

Nanoporos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma , Algoritmos , Repetições de Microssatélites , Análise de Sequência de DNA

Systematic Genomic Surveillance of SARS-CoV-2 Virus on Illumina Sequencing Platforms in the Slovak Republic-One Year Experience.

Rusnáková, Diana; Sedlácková, Tatiana; Radvák, Peter; Böhmer, Miroslav; Misenko, Pavol; Budis, Jaroslav; Bokorová, Silvia; Lipková, Nikola; Forgácová-Jakúbková, Michaela; Sládecek, Tomás; Sitarcík, Jozef; Krampl, Werner; Gaziová, Michaela; Kalináková, Anna; Staronová, Edita; Tichá, Elena; Vráblová, Terézia; Sevcíková, Lucia; Kotvasová, Barbora; Madarová, Lucia; Feiková, Sona; Benová, Kristína; Reizigová, Lenka; Onderková, Zuzana; Ondrusková, Dorota; Loderer, Dusan; Skerenová, Mária; Danková, Zuzana; Janíková, Katarína; Halasová, Erika; Nováková, Elena; Turna, Ján; Szemes, Tomás.

Viruses ; 14(11)2022 11 02.

Artigo em Inglês | MEDLINE | ID: mdl-36366530

RESUMO

To explore a genomic pool of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the pandemic, the Ministry of Health of the Slovak Republic formed a genomics surveillance workgroup, and the Public Health Authority of the Slovak Republic launched a systematic national epidemiological surveillance using whole-genome sequencing (WGS). Six out of seven genomic centers implementing Illumina sequencing technology were involved in the national SARS-CoV-2 virus sequencing program. Here we analyze a total of 33,024 SARS-CoV-2 isolates collected from the Slovak population from 1 March 2021, to 31 March 2022, that were sequenced and analyzed in a consistent manner. Overall, 28,005 out of 30,793 successfully sequenced samples met the criteria to be deposited in the global GISAID database. During this period, we identified four variants of concern (VOC)-Alpha (B.1.1.7), Beta (B.1.351), Delta (B.1.617.2) and Omicron (B.1.1.529). In detail, we observed 165 lineages in our dataset, with dominating Alpha, Delta and Omicron in three major consecutive incidence waves. This study aims to describe the results of a routine but high-level SARS-CoV-2 genomic surveillance program. Our study of SARS-CoV-2 genomes in collaboration with the Public Health Authority of the Slovak Republic also helped to inform the public about the epidemiological situation during the pandemic.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Eslováquia/epidemiologia , COVID-19/epidemiologia , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , Genômica

SWSPM: A Novel Alignment-Free DNA Comparison Method Based on Signal Processing Approaches.

Farkas, Tomás; Sitarcík, Jozef; Brejová, Brona; Lucká, Mária.

Evol Bioinform Online ; 15: 1176934319849071, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31210725

RESUMO

Computing similarity between 2 nucleotide sequences is one of the fundamental problems in bioinformatics. Current methods are based mainly on 2 major approaches: (1) sequence alignment, which is computationally expensive, and (2) faster, but less accurate, alignment-free methods based on various statistical summaries, for example, short word counts. We propose a new distance measure based on mathematical transforms from the domain of signal processing. To tolerate large-scale rearrangements in the sequences, the transform is computed across sliding windows. We compare our method on several data sets with current state-of-art alignment-free methods. Our method compares favorably in terms of accuracy and outperforms other methods in running time and memory requirements. In addition, it is massively scalable up to dozens of processing units without the loss of performance due to communication overhead. Source files and sample data are available at https://bitbucket.org/fiitstubioinfo/swspm/src.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA